Picture for Dawn Song

Dawn Song

University of California, Berkeley

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution

Add code
May 31, 2026
Viaarxiv icon

SCDBench: A Benchmark for LLM-Based Smart Contract Decompilers

Add code
May 27, 2026
Viaarxiv icon

Measuring Real-World Prompt Injection Attacks in LLM-based Resume Screening

Add code
May 27, 2026
Viaarxiv icon

MemFail: Stress-Testing Failure Modes of LLM Memory Systems

Add code
May 26, 2026
Viaarxiv icon

Do Androids Dream of Breaking the Game? Systematically Auditing AI Agent Benchmarks with BenchJack

Add code
May 12, 2026
Viaarxiv icon

DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

Add code
May 06, 2026
Viaarxiv icon

The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break

Add code
Apr 13, 2026
Viaarxiv icon

Intent-aligned Formal Specification Synthesis via Traceable Refinement

Add code
Apr 12, 2026
Viaarxiv icon

SecPI: Secure Code Generation with Reasoning Models via Security Reasoning Internalization

Add code
Apr 04, 2026
Viaarxiv icon

A Framework for Formalizing LLM Agent Security

Add code
Mar 19, 2026
Viaarxiv icon